Model based speech pause detection
نویسندگان
چکیده
This paper presents two new algorithms for robust speech pause detection (SPD) in noise. Our approach was to formulate SPD into a statistical decision theory problem for the optimal detection of noise-only segments, using the framework of model-based speech enhancement (MBSE). The advantages of this approach are that it performs well in high noise conditions, all necessary information is available in MBSE, and no other features are required to be computed. The first algorithm is based on a maximum a posteriori probability (MAP) test and the second is based on a Neyman-Pearson test. These tests are seen to make use of the spectral distance between the input vector and the composite spectral prototypes of the speech and noise models, as well as the probabilistic framework of the hidden Markov model. The algorithms are evaluated and shown to perform well against different types of noise at various SNRs.
منابع مشابه
Accurate endpointing with expected pause duration
In an online automatic speech recognition system, the role of the endpoint detector is to infer when a user has finished speaking a query. Accurate and low-latency endpoint detection is crucial for natural voice interaction. Classic voice activity detector (VAD) based approaches monitor the incoming audio and trigger when a sufficiently long pause is detected. Such approaches are typically limi...
متن کاملAcoustic Feature Analysis and Discriminative Modeling of Filled Pauses for Spontaneous Speech Recognition
Most automatic speech recognizers (ASRs) concentrate on read speech, which is different from spontaneous speech with disfluencies. ASRs cannot deal with speech with a high rate of disfluencies such as filled pauses, repetitions, lengthening, repairs, false starts and silence pauses. In this paper, we focus on the feature analysis and modeling of the filled pauses “ah,” “ung,” “um,” “em,” and “h...
متن کاملSentence boundary detection of spontaneous Japanese using statistical language model and support vector machines
This paper presents two different approaches utilizing statistical language model (SLM) and support vector machines (SVM) for sentence boundary detection of spontaneous Japanese. In the SLM-based approach, linguistic likelihoods and occurrence of pause are used to determine sentence boundaries. To suppress false alarms, heuristic patterns of end-of-sentence expressions are also incorporated. On...
متن کاملSentence boundaries in text and pauses in speech: Correlation or confrontation?
The paper explores the interaction between sentence boundaries marked by annotators in transcriptions of Russian spontaneous speech and actual prosodic boundaries in the signal. The aim of the research is to investigate whether annotators’ prosodic competence allows them to correctly detect sentence boundaries in speech based on textual information only. We found that inter-annotator agreement ...
متن کاملMulti-Channel l1 Regularized Convex Speech Enhancement Model and Fast Computation by the Split Bregman Method
A convex speech enhancement (CSE) method is presented based on convex optimization and pause detection of the speech sources. Channel spatial difference is identified for enhancing each speech source individually while suppressing other interfering sources. Sparse unmixing filters indicating channel spatial differences are sought by l1 norm regularization and the split Bregman method. A subdivi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1997